Proceedings: Evolution of Large Scale Structure - Garching, August 1998 



THE COSMIC DISTRIBUTION OF CLUSTERING 



S. Colombi 1 , I. Szapudi 2 and the VIRGO consortium 3 
1 Institut d'Astrophysique de Paris, CNRS, 
98bis bd Arago,F-75014 Paris, France 
2 University of Durham, Department of Physics, 

South Road. Durham. PHI SLE. UK 



3 5ee, e.g., http://star-www.dur.ac.uk/'" ' frazerp /virgo /people. htm. 



ABSTRACT. For a given statistic, A, the cosmic distribution function, T(A), 
is the probability of measuring a value A in a finite galaxy catalog. For statis- 
tics related to count-in-cells, such as factorial moments, F^, the average corre- 
lation function, £, and cumulants, Sn, the functions T(F k ), T(£), and Y(Sjv) 
were measured in a large rCDM simulation. This N-bodj experiment simu- 
lates almost the full "Hubble Volume" of the universe, thus, for the first time, 
it allowed for an accurate analysis of the cosmic distribution function, and, 
in particular, of its variance (AA) 2 , the cosmic error. The resulting detailed 
knowledge about the shape of T is crucial for likelihood analyses. The mea- 
sured cosmic error agrees remarkably well with the theoretical predictions 
of Szapudi & Colombi (1996) and Szapudi, Bernardeau & Colombi (1998) 
in the weakly non-linear regime, while the predictions are slightly above the 
measurements in the highly nonlinear regime. When the relative cosmic er- 
ror is small, (AA/A) 2 <C 1, function T is nearly Gaussian. When (AA/A) 2 
approaches unity or is larger, function T(A) is increasingly skewed and well 
approximated by a lognormal distribution for A = F k , or A = £. The mea- 
sured cumulants follow accurately the perturbation theory predictions in the 
weakly nonlinear regime. Extended perturbation theory is an excellent ap- 
proximation for all the available dynamic range. 



1 Introduction 

To confront theory with observa- 
tions, an accurate understanding of 
errors is essential. The state of the 
art maximum likelihood approach re- 
quires full information on the dis- 
tribution function of measurements; 
the scatter alone is insufficient if the 
underlying error distribution is non- 
Gaussian. As shown later, this is of- 
ten the case in large scale struc- 
ture studies. What follows, focuses 
on statistics related to count-in-cclls 
(CIC), in particular, factorial mo- 
ments, A = Fk (e.g., Szapudi & Sza- 
lay, 1993), the averaged two point 
correlation function, A = £, and 
higher order cumulants, A = Spf 
(e.g., Balian & Schaeffer, 1989). We 
are concerned with the question: 
what is the probability of measuring 
a value A = A in a finite galaxy cat- 
alog, S it with volume VI The CIC 
indicators from a particular realiza- 
tion are influenced by various statis- 
tical effects (e.g., Szapudi & Colombi 
1996, hereafter SC), which can be ap- 
proximately separated: 



1. Finite volume effects are due to 
fluctuations of the density field on 
scales larger than the catalog. 

2. Edge effects are caused by the un- 
even statistical weight given to 
objects near the survey boundary. 

3. Discreteness effects are related to 
the finite number of galaxies in 
the catalog, which sample the un- 
derlying continuous field. 

In a large number of realizations, 
1 < i < Cs, the distribution of mea- 
surements, the cosmic distribution 
function T(A) could be estimated. 
In most cases, this is unachievable 
in practice, but T(A) could still be 
evaluated theoretically, or via sim- 
ulations. An important special case 
is the Gaussian distribution, which 
is fully characterized by its average, 
(A) = A, and its variance, (AA/A) 2 , 
the cosmic error. 

This contribution concentrates on 
the properties T, representing a frac- 
tion of the results from a more ex- 
tended article by Colombi, Szapudi, 
et al. 1998 based on the latest TV- 
body experiment of the VIRGO con- 
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sortium. The next section compares 
the measured cumulants, Spf, 1 < 
N < 10, with perturbation theory 
(PT, e.g., Juszkiewicz etal. 1993; 
Bernardeau, 1994) and extended per- 
turbation theory predictions (EPT, 
e.g., Colombi etal. 1996). Section 
§3 collates the numerical cosmic er- 
ror with the theoretical predictions 
of SC, and Szapudi, Bernardeau & 
Colombi (1998, hereafter SBC). Fi- 
nally, Section §4 exposes the shape 
of cosmic distribution function T. 



2 The Underlying Statistics 

The algorithm of Szapudi et al. 
(1998) was employed to extract CICs 
from a tCDM simulation with one 
billion particles in a cubic box of 
2000 ft -1 Mpc (see the contribution 
of A. Evrard in the same volume). 
Note that most of the considerations 
of this work are quite insensitive to 
the particular cosmological model, 
thus similar results are expected for 
all currently fashionable CDM vari- 
ants. The virtual (periodic) universe 
was divided into Cg = 16 3 adjacent 
cubic subsamples, each of them rep- 
resenting a possible realization of our 
visible, local universe. [| 512 3 cubical 
sampling cells of size I were placed in 
the full simulation, and in each sub- 
sample Si, covering the scale range 
of 0.24ft- 1 < I < 250ft- 1 Mpc. The 
combined subsamples probed the tail 
of the CIC probability distribution 
function, Pjv, extremely well, down 
to Pjv ~ 1.8. 10- 12 . The measure- 
ment in the whole sample is some- 
what less accurate, but still state of 
the art, Pat £ 7.5.10" 9 . 

Figure hi shows the cumulants as 
functions of the variance, represent- 
ing the most accurate measurement 
in the widest dynamic range to date. 
In agreement with previous stud- 
ies (see, e.g., Gaztahaga & Baugh, 
1995, Szapudi etal. 1998), the re- 
sults match PT extremely well in 
the weakly nonlinear regime where 
£ <, 1, and EPT is an excellent ap- 
proximation in all the available dy- 
namic range. The 1-loop calculations 
(not displayed on figure ph based 



on spherical model by Fosalba & 
Gaztanaga (1998) show good agree- 
ment with the numerical results up 
to |= 1. 



3 The Cosmic Error 

Figure ^ shows the measured cos- 
mic error together with the predic- 
tions of SC and SBC. The various 
theoretical models agree extremely 
well with the measurements in the 
weakly nonlinear regime (at least for 
the FVs) and tend to be slightly 
higher than the numerical estimates 
in the nonlinear regime. EPT yields 
the closest match to the data. Taken 
at face value, these results indicate 
that the hierarchical assumption for 
the joint moments used by SC and 
SBC is inaccurate on the smallest 
scales, and should be corrected for 
the error calculations, where bivari- 
ate distributions play a crucial role. 

It is interesting although not sur- 
prising to note that factorial mo- 
ments and cumulants of the same or- 
der behave differently in terms of er- 
rors. For example, on large scales, 
where the cosmic error is dominated 
by edge effects (SC), the cumulants, 
£, and S3 have larger scatter than the 
full moments, F%, and F3; the reverse 
is true on small scales. The situation 
is exactly analogous to the "integral 
constraint" problem. 



4 The Cosmic Distribution 
Function 

On various scales as indicated on 
each panel, Figure [3] shows T(A) for 
A = i*2i £1 -F3 and 53 as a function 
of the relative difference normalized 
by the variance, SA/AA. The mea- 
surement is compared to the Gaus- 
sian limit, the lognormal distribution 
(see, e.g., Coles & Jones 1991), and 
a generalized version of it: 



T(A) 




X exp — 



(1) 



The box size was conveniently cho- 
sen for the compressed output format 
provided by Adrian Jenkins to speed 
up processing. 



with x = s(A - A)/AA + 1, and 
r] = ln(l + s 2 ). The adjustable pa- 
rameter s is chosen such that the 
probability distribution function (l|) 
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Figure 1. The cumulants, Sn , measured in the whole tCDM simulation are 
shown as functions of the variance £. The dots, long dashes and short dashes 
correspond to PT with only terms corresponding to the first order logarith- 
mic derivative of £, PT with only terms up to the second order logarithmic 
derivative of!;, and EPT (see, e.g., Bernardeau 1994 an d Colombi etal. 
1996), respectively. The lower, and upper panels correspond to 3 < N < 5, 
and to 6 < N < 10, respectively. The Sn's are increasing with N. Finally, 
the filled, and open symbols correspond to the results from the full simulation, 
and from the combination of all the subsamples Si, respectively. 



has same average, variance and skew- 
ness S = s 3 + 3s than the measured 
T(A). Of the three choices, function 
(|]]) was designed to yield the best fit 
to the measurements (and it docs!) 
with its three adjustable parameters. 
Moreover, this approximation also 
appears to be sufficiently accurate 
to describe the shape of T(A), espe- 
cially for the large A tail. Therefore 
a third order theory is necessary to 
determine the shape of T(A) in the 
studied dynamic range. 

The amount of skcwness of the 
dotted curves in Figure |i| is an 
indicator of the magnitude of the 
cosmic error, since the skewness of 
the lognormal distribution is S = 
(AA/A) 2 + 3AA/A. As we see, func- 
tion T(j4) is nearly Gaussian when 



the cosmic error is small, and in 
general becomes increasingly skewed 
with AA/A. For the factorial mo- 
ments and the cosmic distribution 
function is nearly lognormal, thus for 
these estimators the theory of SC, 
and SBC is sufficient to character- 
ize the shape of the distribution func- 
tion. 
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Figure 2. The relative cosmic error as a function of scale for the facto- 
rial moments (left panels), £, and the cumulants (right panels) are shown. 
The symbols represent the cosmic scatter obtained from the measurements in 
Si, i = 1, ...,16 3 . The dots, dashes and long dashes display the theoretical 
predictions, based on the hierarchical models of Szapudi & Szalay (1993), 
Bernardeau & Schaeffer (1992), and EPT (SBC), respectively. Note that 
when the cosmic error approaches unity or, is larger, the theoretical calcula- 
tion is not expected to be valid in the right panels, as it relies on a Taylor 
expansion in terms of the relative error. 



galactic Astronomy and Cosmology 
at Durham. The figures of this paper 
were extracted from Colombi, Sza- 
pudi etal. (1998). 
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Figure 3. The cosmic distribution function for F2, £ (6 upper panels), F3 
and S3 (6 lower panels) are shown. Three scales are considered in each case, 
indicated in the upper right corner of each panel, understood in h —1 Mpc. 
The symbols represent the measurements. The errorbars correspond to the 
measurement error due to the fact we use = 16 3 samples as realizations 
of our local universe. On each panel, the solid curve, the dots, and the dashes 
corresponds to Gaussian, lognormal and extended lognormal as discussed in 
the text [equation (jj)/. 



